A Keras neural network and the MNIST dataset

Author: Leonardo Espin

Date: 10/2/2019

Below I train a standard feed-forward neural network and a convolutional neural network on the MNIST handwritten digits dataset.

from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt

Loading the MNIST dataset

(x_train_i, y_train), (x_test_i, y_test) = keras.datasets.mnist.load_data()
(60000, 28, 28)
(10000, 28, 28)
#check one of the images 
img_rows, img_cols = 28,28

Preprocessing the data

#normalize the data to 0-1:
x_train_i = x_train_i/ 255
x_test_i = x_test_i/255

#reshaping for feeding the NN

# convert class integers to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
(60000, 10)
(10000, 10)

Building a standard feed-forward DNN

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
#The model needs to know what input shape it should expect
model.add(Dense(25,  #input size 28*28, output is of size 25 to match a hidden layer

model.add(Dense(25, #number of nodes in this dense layer

#the prediction layer. note that we convert outputs into probabilities
model.add(Dense(num_classes, #number of prediction classes

Model: "sequential"
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 25)                19625     
dense_1 (Dense)              (None, 25)                650       
dense_2 (Dense)              (None, 10)                260       
Total params: 20,535
Trainable params: 20,535
Non-trainable params: 0

Fitting the DNN model

#configuring the learning process,
model.compile(loss='categorical_crossentropy',#logarithmic loss for multi-class classification
              optimizer='adam',#special version of gradient descent that automatically calculates an
              metrics=['accuracy'])#optimal learning rate for each gradient descent step 
model.fit(x_train, y_train,
          batch_size=100,#number of images for each gradient descent step
          epochs=20,#1-time through the entire data is an epoch, 20 times per image
          validation_split = 0.2)
Train on 48000 samples, validate on 12000 samples
Epoch 1/20
48000/48000 [==============================] - 4s 77us/sample - loss: 0.5390 - accuracy: 0.8479 - val_loss: 0.2813 - val_accuracy: 0.9208
Epoch 2/20
48000/48000 [==============================] - 3s 61us/sample - loss: 0.2554 - accuracy: 0.9277 - val_loss: 0.2148 - val_accuracy: 0.9402
Epoch 3/20
48000/48000 [==============================] - 3s 58us/sample - loss: 0.2114 - accuracy: 0.9404 - val_loss: 0.1981 - val_accuracy: 0.9445
Epoch 4/20
48000/48000 [==============================] - 3s 61us/sample - loss: 0.1839 - accuracy: 0.9473 - val_loss: 0.1816 - val_accuracy: 0.9480
Epoch 5/20
48000/48000 [==============================] - 3s 60us/sample - loss: 0.1654 - accuracy: 0.9520 - val_loss: 0.1669 - val_accuracy: 0.9525
Epoch 6/20
48000/48000 [==============================] - 3s 60us/sample - loss: 0.1498 - accuracy: 0.9567 - val_loss: 0.1558 - val_accuracy: 0.9548
Epoch 7/20
48000/48000 [==============================] - 3s 60us/sample - loss: 0.1369 - accuracy: 0.9603 - val_loss: 0.1516 - val_accuracy: 0.9569
Epoch 8/20
48000/48000 [==============================] - 3s 59us/sample - loss: 0.1274 - accuracy: 0.9630 - val_loss: 0.1511 - val_accuracy: 0.9567
Epoch 9/20
48000/48000 [==============================] - 3s 59us/sample - loss: 0.1184 - accuracy: 0.9657 - val_loss: 0.1445 - val_accuracy: 0.9584
Epoch 10/20
48000/48000 [==============================] - 3s 60us/sample - loss: 0.1102 - accuracy: 0.9675 - val_loss: 0.1455 - val_accuracy: 0.9572
Epoch 11/20
48000/48000 [==============================] - 3s 59us/sample - loss: 0.1033 - accuracy: 0.9698 - val_loss: 0.1510 - val_accuracy: 0.9547
Epoch 12/20
48000/48000 [==============================] - 3s 61us/sample - loss: 0.0994 - accuracy: 0.9704 - val_loss: 0.1363 - val_accuracy: 0.9608
Epoch 13/20
48000/48000 [==============================] - 3s 59us/sample - loss: 0.0916 - accuracy: 0.9728 - val_loss: 0.1370 - val_accuracy: 0.9609
Epoch 14/20
48000/48000 [==============================] - 3s 60us/sample - loss: 0.0871 - accuracy: 0.9736 - val_loss: 0.1379 - val_accuracy: 0.9605
Epoch 15/20
48000/48000 [==============================] - 3s 60us/sample - loss: 0.0830 - accuracy: 0.9753 - val_loss: 0.1334 - val_accuracy: 0.9620
Epoch 16/20
48000/48000 [==============================] - 3s 60us/sample - loss: 0.0794 - accuracy: 0.9762 - val_loss: 0.1306 - val_accuracy: 0.9628
Epoch 17/20
48000/48000 [==============================] - 3s 59us/sample - loss: 0.0760 - accuracy: 0.9769 - val_loss: 0.1403 - val_accuracy: 0.9619
Epoch 18/20
48000/48000 [==============================] - 3s 60us/sample - loss: 0.0721 - accuracy: 0.9780 - val_loss: 0.1332 - val_accuracy: 0.9622
Epoch 19/20
48000/48000 [==============================] - 3s 59us/sample - loss: 0.0682 - accuracy: 0.9793 - val_loss: 0.1523 - val_accuracy: 0.9572
Epoch 20/20
48000/48000 [==============================] - 3s 61us/sample - loss: 0.0659 - accuracy: 0.9796 - val_loss: 0.1383 - val_accuracy: 0.9616
Testing the model on unseen data

score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
Test loss: 0.13213677672185004
Test accuracy: 0.9637
#the conversion below is to remove the error
#"expected dense_input to have shape (748,)..." 

#array dimensions:
array([0., 0., 0., 0., 0., 0., 0., 1., 0., 0.], dtype=float32)
Building a convolutional neural network

The NN below as two extra convolutional layers. The effect of these is dramatic, by reducing the ammount of training required from 20 epochs in the FFNN above to 4 epochs below. Despite of this reduction in the amount of training the accuracy increases (starting from the first epoch from 84% to 93%) by two percentage points

from tensorflow.keras.layers import Flatten, Conv2D
cnv_model.add(Conv2D(12,                #number of convolutional filters
                     kernel_size=(3, 3),#shape of convolution kernel
                 input_shape=(img_rows, img_cols, 1)))

#another convolutional layer
cnv_model.add(Conv2D(20,kernel_size=(3, 3),

#removing an extra convolution layer resulted in slightly improved accuracy O(1e-3)

#the flattening layer converts the output of the previous layers
#into a 1D representation for each image

#for some reason this layer has an order of magnitude more parameters than
#the dense leyer in the previous model. Is it due to the extra dimension
#for the image convolutions?
cnv_model.add(Dense(25, #number of nodes in this dense layer
#the prediction layer. note that we convert outputs into probabilities
cnv_model.add(Dense(num_classes, #number of prediction classes
Model: "sequential_1"
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 12)        120       
conv2d_1 (Conv2D)            (None, 24, 24, 20)        2180      
flatten (Flatten)            (None, 11520)             0         
dense_3 (Dense)              (None, 25)                288025    
dense_4 (Dense)              (None, 10)                260       
Total params: 290,585
Trainable params: 290,585
Non-trainable params: 0
#redimensioning the input is necessary because of the convolutions
#the 4th-dimension is for the single color channel (gray-scale images)
cnv_model.fit(x_train_i.reshape(x_train_i.shape[0],img_rows,img_cols,1), y_train,
          batch_size=100,#number of images for each gradient descent step
          epochs=4,      #notice the smaller number of epochs!
          validation_split = 0.2)
Train on 48000 samples, validate on 12000 samples
Epoch 1/4
48000/48000 [==============================] - 41s 856us/sample - loss: 0.2318 - accuracy: 0.9303 - val_loss: 0.0929 - val_accuracy: 0.9744
Epoch 2/4
48000/48000 [==============================] - 44s 911us/sample - loss: 0.0692 - accuracy: 0.9794 - val_loss: 0.0642 - val_accuracy: 0.9825
Epoch 3/4
48000/48000 [==============================] - 44s 924us/sample - loss: 0.0470 - accuracy: 0.9852 - val_loss: 0.0626 - val_accuracy: 0.9817
Epoch 4/4
48000/48000 [==============================] - 45s 944us/sample - loss: 0.0346 - accuracy: 0.9892 - val_loss: 0.0614 - val_accuracy: 0.9822
score2 = cnv_model.evaluate(x_test_i.reshape(x_test_i.shape[0],img_rows,img_cols,1),
                       y_test, verbose=0)
print('Test loss:', score2[0])
print('Test accuracy:', score2[1])
Test loss: 0.0518858770695515
Test accuracy: 0.9843
